Search API Design Evaluation and Latency Budget
Learn the approaches to meet the non-functional requirements and the response time of the search API.
Introduction#
When designing an API, the optimization of one set of parameters may need rarefaction of the other set of parameters due to the tradeoffs between them. The preceding lessons have seen various aspects of modeling a search API. Mainly, we focused on accomplishing various functionalities of the search API. However, in this lesson, we focus on the non-functional aspects of the search API and how we meet them.
Non-functional requirements#
The non-functional requirements are discussed below.
Availability#
The availability of the search API is enhanced by utilizing rate limiting and API monitoring techniques that prevent our API and the back-end servers from choking. Similarly, to avoid cascaded failure in the internal services, we employ circuit breakers at various points that not only help in the availability of our API but also aid in its reliability.
Scalability#
The scalability of our API is increased by having redundant servers at the backend. So whenever one is down, the other would be on standby to handle the search queries. We also cache results to frequently searched queries. In addition to that, we make use of caching technologies between the client and our services to deliver static content. This reduces the burden on our servers, and consequently, we are able to handle a large number of queries.
Note: For more details on building scalable systems see the Grokking Modern System Design Interview for Engineers & Managers course.
Security #
We support TLS 1.2 and its newer versions to provide a secure communication channel for our APIs to exchange data between client and server. The security in search API can be provided in two ways:
A user without login: Since search is a public service, it’s possible to authenticate the requesting application (client) using the API key only.
A user with login: To provide a tailored response to users, it’s possible for end users to authenticate themselves using user credentials like username and password. Other than that, JWTs can also be used to obtain a personalized experience from the search service.
Low latency#
In order to reduce the latency of our search API, we have opted for a number of techniques. For instance, we utilize high-speed caches in the API gateway to keep the frequently searched queries that are generic and whose data is not updated instantly. Similarly, on the server side, we set a maximum threshold on time to generate results for each search query. If the search query takes more time than the threshold, the execution is halted, and the results found within the time limit are returned to the user. Furthermore, we employ pagination techniques, which reduce the network latency while fetching results in the form of a number of pages instead of retrieving all the searched results at once, which may exceed hundreds of pages. Also, performing the filtering before passing the results to the search server reduces the overall latency, as explained in the previous lesson.
Point to Ponder
Question
If we set a time limit on searching a query on the server side, wouldn’t it affect the accuracy of the search results?
There is a tradeoff between the two measures: latency and accuracy. However, the occurrence of queries exceeding the time limit is very rare. Most queries are processed within the time limit that produces the desired results. On the other hand, we should keep the time limit long enough to have a low impact on accuracy and latency.
Achieving Non-Functional Requirements
Non-Functional Requirements | Approaches |
Availability |
|
Scalability |
|
Security |
|
Low latency |
|
Latency budget#
In this section, we estimate the response time of our search API. The response time can vary depending on the message size, cached response, and simple or complex filters in the query. Let’s start with the estimation of the request-response sizes and then calculate the response time.
Note: As discussed in the Back-of-the-Envelope Calculations for Latency chapter, in the case of GET, the average RTT remains the same regardless of the data size (due to the small request size), and the time to download the response varies by 0.4 milliseconds (ms) per KB.
Request size: Since the GET request has no body, we’ll assume it to be 1.5 KB because of the addition of some query parameters, like
query,sort, andfilter.Response size: The response size mainly depends on the number of results on a page, that is, the
limitparameter in pagination. Assume the response body includes ten search results, five recommendations, and two ads. If each result is 1 KB, and the size of the recommendations is 5 KB, whereas ads are 5 KB, then the total size is equal to:
Response time#
The following calculator calculates the latency and estimates the response time based on the request and response size for the search service.
Response Time Calculator of the Search API
| Enter size in KBs | 20 | KB |
| Minimum latency | f198.5 | ms |
| Maximum latency | f279.5 | ms |
| Minimum response time | f202.5 | ms |
| Maximum response time | f291.5 | ms |
Assuming the response size is 20 KB, then the latency is calculated by:
Similarly, the response time is calculated using the following equation:
Now, for minimum response time, we use minimum values of base time and processing time:
For maximum response time, we use maximum values of base time and processing time:
Note: We considered the minimum processing time (in the case of a parallel execution of API calls on all services) is 4 ms, and the maximum processing time (assuming services are not executing in parallel) is 12 ms. The details of these estimations are provided in the Back-of-the-Envelope Calculations for Latency chapter.
A summary of the overall response time for the search service is shown in the illustration below.
In this lesson, we have discussed how it’s possible to meet non-functional requirements by incorporating different techniques in our design. We also observed from the calculations above that the designed API has low latency.
Refinements in the Search API
Requirements of the File API